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Abstract 

We consider the problem of recovering two unknown vectors, w and of length L from 
their circular convolution. We make the structural assumption that the two vectors are members 
known subspaces, one with dimension N and the other with dimension K. Although the observed 
convolution is nonlinear in both w and cc, it is linear in the rank-1 matrix formed by their outer 
product wx*. This observation allows us to recast the deconvolution problem as low-rank matrix 
recovery problem from linear measurements, whose natural convex relaxation is a nuclear norm 
minimization program. 

We prove the effectiveness of this relaxation by showing that for "generic" signals, the pro- 
gram can deconvolve w and x exactly when the maximum of N and K is almost on the order 
of L. That is, we show that if x is drawn from a random subspace of dimension and w is 
a vector in a subspace of dimension K whose basis vectors are "spread out" in the frequency 
domain, then nuclear norm minimization recovers it; a?* without error. 

We discuss this result in the context of blind channel estimation in communications. If we 
have a message of length which we code using a random L x N coding matrix, and the 
encoded message travels through an unknown linear time-invariant channel of maximum length 

then the receiver can recover both the channel response and the message when L > N -\- 
to within constant and log factors. 

Index terms: Blind deconvolution, low-rank matrix, compressed sensing, channel estimation, 
rank-1 matrix, image deblurring, convex programming, and nuclear norm minimization. 



1 Introduction 

This paper considers a fundamental problem in signal processing and communications: we observe 
the convolution of two unknown signals, w and cc, and want to separate them. We will show that 
this problem can be naturally relaxed as a semidefinite program (SDP), in particular, a nuclear 
norm minimization program. We then use this fact in conjunction with recent results on recovering 
low-rank matrices from underdetermined linear observations to provide conditions under which w 
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and X can be deconvolved exactly. Qualitatively, these results say that if both w and x have length 
L, w lives in a fixed subspace of dimension K and is spread out in the frequency domain, and x 
lives in a "generic" subspace chosen at random, then w and x are separable with high probability. 

The general statement of the problem is as follows. We will assume that the length L signals live 
in known subspaces of whose dimensions are K and N . That is, we can write 



w 



X 



Bh, 
Cm, 



^ G IE 
m G 



for some L x K matrix B and L x N matrix C. The columns of these matrices provide bases for 
the subspaces in which w and x live; recovering h and m, then, is equivalent to recovering w and 

X. 



We observe the circular convolution of w and x: 



y^w^x, or y[i] = ^ w[i']x[i - / + 1], 



(1) 



where the index £ — £^ -\- 1 in the sum above is understood to be modulo {1, . . . , L}. It is clear 
that without structural assumptions on w and cc, there will not be a unique separation given the 
observations y. But we will see that once we account for our knowledge that w and x lie in the 
span of the columns of B and C, respectively, they can be uniquely separated in many situations. 
Detailing one such set of conditions under which this separation is unique and can be computed by 
solving a tractable convex program is the topic of this paper. 

1.1 Matrix observations 



We can break apart the convolution in ([!]) by expanding cc as a linear combination of the columns 
Ci, . . . , Cn of C, 

y m{l)w * Ci + m{2)w * C2 + • • • + m{N)w * Cn 

m(l)w 



[circ(Ci) circ(C2) 



circ 



(Cn)] 



m{2)w 



m{N)w 



where ciTc{Cn) corresponds to the L x L circulant matrix whose action corresponds to circular 
convolution with the vector Cn- Expanding tt; as a linear combination of the columns of B, this 
becomes 



y = [circ(Ci)B circ(C2)B • • • circ(C7v)B] 



m{l)h 
m{2)h 

m{N)h 



(2) 



We will find it convenient to write ^ in the Fourier domain. Let F be the L-point normalized 
discrete Fourier transform (DFT) matrix 

F(u,i) = J_^-i2.(.-i)(e-i)/L i<a;,e<L. 
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We will use C — FC for the C-basis transformed into the Fourier domain, and also B — FB. 
Then circ(Cn) = F*AnF, where A 7T, is a diagonal matrix constructed from the nth. column of C, 
= diag(\/LC'), and ^ becomes 



y = Fy= [AiB A2B • • • AnB] 



Clearly, recovering y is the same as recovering y. 



'm{l)h' 
m{2)h 

m{N)h 



(3) 



The expansions ^ and ([s]) make it clear that while y is a nonlinear combination of the coefficients 
h and m, it is a linear combination of the entries of their outer product Xq = ^m*. We can pose 
the blind deconvolution problem as a linear inverse problem where we want to recover a, K x N 
matrix from observations 

y = ^(^0), (4) 

through a linear operator A which maps K x N matrices to R^. For A to be invertible over all 
matrices, we need at least as many observations as unknowns, L > NK. But since we know Xq 
has special structure, namely that its rank is 1, we will be able to recover it from L <C NK under 
certain conditions on A. 

As each entry of ^ is a linear combination of the entries in hm* , we can write them as trace inner 
products of different K x N matrices against hrrt' . Using 6^ G for the ^th column of S* and 
Q G as the ^th row of \fLC ^ we can translate one entry in ([3]) a^ 

m = Q(l)m(l)(^, ht) + Q(2)m(2)(^, 6^) + • • • + Q(iV)m(iV)(^, ht) 
{ci,m){h,bi) 

= trace(A|(^m*)), where = b£c}. (5) 



Now that we have seen that separating two signals given their convolution can be recast as a matrix 
recovery problem, we we turn our attention to a method for solving it. In the next section, we 
argue that a natural way to recover the expansion coefficients m and h from measurements of the 
form ([3]) is using nuclear norm minimization. 



1.2 Convex relaxation 

The previous section demonstrated how the blind deconvolution problem can be recast as a linear 
inverse problem over the (nonconvex) set of rank-1 matrices. A common heuristic to convexify the 
problem is to use the nuclear norm^ the sum of the singular values of a matrix, as a proxy for 
rank 1 1 . In this section, we show how this heuristic provides a natural convex relaxation. 

Given ^ G C^, our goal is to find h G and m G that are consistent with the observations 
in ([3]). Making no assumptions about either of these vectors other than the dimension, the natural 
way to choose between multiple feasible points is using least-squares. We want to solve 

min llt^llo + ll'^lli subject to y(£) = {c£,u){h,b£), ^=1,...,L. (6) 

^As we are now manipulating complex numbers in the frequency domain, we will need to take a little bit of care 
with definitions. Here and below, we use (it, v) = v*u = trace(itv*) for complex vectors u and v. 
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This is a non-convex quadratic optimization problem. The cost function is convex, but the quadratic 
equahty constraints mean that the feasible set is non-convex. A standard approach to solving such 
quadratically constrained quadratic programs is to use duality (see for example |2|). A standard 
calculation shows that the dual of ^ is the semi-definite program (SDP) 



min Re(^, A) 

A 



subject to 



(7) 



EtiA(^)*A* 



with the Ai = 6^c| defined as in the previous section. Taking the dual again will give us a convex 
program which is in some sense as close to ^ as possible. The dual SDP of Q is [s] 



min ^ trace (Wi) + i trace (W2) 



Wi,W2,X 

subject to 



(8) 



Wi 

X* 



X 

W2 
y^A{x), 



>- 



which is completely equivalent to 



min W^W* 
subject to y = A{X) 



(9) 



That is, the nuclear norm heuristic is the "dual-dual" relaxation of the intuitive but non-convex 
least-squares estimation problem ([g]). 

Our technique for untangling w and x from their convolution, then, is to take the Fourier transform 
of the observation y = w ^ x and use it as constraints in the program Q. That Q is the natural 
relaxation is fortunate, as an entire body of literature in the field of low-rank recovery has arisen 
in the past five years that is devoted to analyzing problems of the form ^ . We will build on some 
of the techniques from this area in establishing the theoretical guarantees for when Q is provably 
effective presented in the next section. 

There have also been tremendous advances in algorithms for computing the solution to optimization 
problems of both types ^ and Q. In Section 2.1 , we will briefly detail one such technique we used 
to solve ([g]) on a relatively large scale for a series of numerical experiments in Sections 2.2-2.4 



1.3 Main results 

We can guarantee the effectiveness of ^ for relatively large subspace dimensions K and N when B 
is incoherent in the Fourier domain, and when C is generic. Before presenting our main analytical 
result. Theorem [T] below, we will carefully specify our models for B and C, giving a concrete 
definition to the terms 'incoherent' and 'generic' in the process. 

We will assume, without loss of generality, that the matrix B is an arbitrary L x K matrix with 
orthonormal columns: 

L 

B^'B = B*B = bib} = /, (10) 

1=1 
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where the are the columns of S*, as in ([5]). Our results will be most powerful when B is diffuse 
in the Fourier domain, meaning that the all have similar norms. We will use the (in)coherence 
parameter /ii to quantify the degree to which the columns of B are jointly concentrated in the 
Fourier domain: ^ 

u? = — max \\bA\l. (11) 



From (10), we know that the total energy in the rows of B is ||6^||2 = and that ||6^||2 ^ 1- 

Thus 1 < fj^i < L/K, with the coherence taking its minimum value when the energy in B is evenly 
distributed throughout its rows, and its maximum value when the energy is completely concentrated 
on K of the L rows. Our results will also depends on the minimum of these norms 

/i? = ^ min llb^llo. (12) 

We will always have < /i| < 1 and /i| < /i^. An example of a maximally incoherent B, where 
Ml = Ml = 1' is 

\Ik] 




(13) 



where Ik is the K x K identity matrix. In this case, the range of B consists of "short" signals 
whose first K terms may be non-zero. The matrix B is simply the first K columns of the discrete 
Fourier matrix, and so every entry has the same magnitude. 

Our analytic results also depend on how diffuse the particular signal we are trying to recover 
w — Bh is in the Fourier domain. With w — Fw — Bh, we define 

fil = max \wii)\^ = L • max \{h, be)\^. (14) 

i^£^L 1_^_L 

If the signal w is more or less "flat" in the frequency domain, then will be a small constant. 
Note that it is always the case that 1 < < /x^K. 

With the subspace in which w resides fixed, we will show that separating w and x = Cm will be 
possible for "most" choices of the subspace C of a certain dimension N — we do this by choosing 
the subspace at random from an isotropic distribution, and show that Q is successful with high 
probability. For the remainder of the paper, we will take the entries of C to be independent and 
identically distributed random variables, 

C[£,n] - Normal(0,L-^). 

In the Fourier domain, the entries of C will be complex Gaussian, and its columns will have 
conjugate symmetry (since the columns of C are real). Specifically, the rows of C will be distributed 
a£] 

fNormal(0,J) £^1 
~ |Normal(0, 2-1/2/) +jNormal(0, 2-1/2 J) ^ ^ 2, . . . , L/2 + 1 ' 

Ci = CL-e+2, for £ = L/2 + 2, . . . , L. 

Similar results to those we present here most likely hold for other models for C. The key property 
that our analysis hinges critically on is the rows q of C are independent — this allows us to apply 



^We are assuming here that L is even; the argument is straightforward to adapt to odd L. 
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recently developed tools for estimating the spectral norm of a sum of independent random linear 
operators. 

We now state our main result: 



Theorem 1. Suppose the bases B^C and expansion coefficients h^m satisfy the conditions (10); 



(11); (14); and (15) above. Fix a>l. Then there exists a constant Ca = 0{a) depending only on 
a, such that if 

L > Camd.x{filK, filN) log^{KN), (16) 
then Xq = ^m* is the unique solution to ^ with probability 1 — 0{L{NK)~^). 



When the coherences are low, meaning that jii and fih are on the order of a constant, then (16) is 
coming within a logarithmic factor of the inherent number of degrees of freedom in the problem, 
as it takes K ^ N variables to specify both h and m. 

While Theorem [l] establishes theoretical guarantees for specific types of subspaces specified by 
B and C, we have found that treating blind deconvolution as a linear inverse problem with a 
rank constraint leads to surprisingly good results in many situations; see, for example, the image 
deblurring experiments in Section |2.4[ 



The recovery can also be made stable in the presence of noise, as described by our second theorem: 
Theorem 2. Let Xq = hnri' and A as in and suppose we observe 

y = A{Xo) + z, 

where z G is an unknown noise vector with \\z\\2 < S. If L obeys ( [161 ) ^^^^ ^ — 

li\NK{2^p\og{NK))-^ for some /3 > 0^ then with probability 1 - 0{L{NK)-^) the solution 
X to 

min ||X||* . . 

subject to \\y-A{X)\\2 <6 ^ ^ 



will obey 

for a fixed constant C. 



\X-Xo\\f < C—y^mm{K, N) 6, 

ML 



The program in (17) is also convex, and is solved with numerical techniques similar to the equality 



constrained program in ^ 

In the end, we are interested in how well we recover x and w. The stability result for Xq can easily 
be extended to a guarantee for the two unknown vectors. 

Corollary 1. Let diuivi be the best rank-1 approximation to X , and set h — \/~g\U\ and rh = 
y/oivi. Set S = \\X — XqWf- Then there exists a constant C such that 

\\h — ah\\2 < Cmin ^5/||^||2, ||^||2^ , ||m — a~''-m||2 < Cmin ^5/||m||2, ||m||2^ . 
for some scalar multiple a. 

Proof of this corollary follows the exact same line of reasoning as the later part of Theorem 1.2 
in [4]. 
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1.4 Relationship to phase retrieval and other quadratic problems 



Blind deconvolution of it; * as is apparent from ([!]), is equivalent to solving a system of quadratic 
equations in the entries of w and x. The discussion in Section shows how this system of 
quadratic equations can be recast as a linear set of equations with a rank constraint. In fact, this 
same recasting can be used for any system of quadratic equations in w and x. The reason is simple: 
taking the outer product of the concatenation of w and x produces a rank-1 matrix that contains 
all the different combinations of entries of w multiplied with each other and multiplied by entries 
in x: 





w[2]w[l] 


w[l]w[2] ■ 
w[2f ■ 


■ w[l]w[L\ 

■ w[2\w[L] 


w[l]x[l] 
w[2]x[l] 


w[l]x[2] ■ ■ 
w[2]x[2] ■ ■ 


■ w[l]x[L] 

■ w[2]x[L] 




w[L]w[l] 


w[L]w[2] ■ 


■ w[Lf 


w[L]x[l] 


w[L]x[2] ■ ■ 


■ w[L]x[L] 


x[l]w[l] 
x[2]w[l] 


x[l]w[2\ ■ 
x[2]w[2] ■ 


■ x[l]w[L\ 

■ x[2]w[L] 


x[2]x[l] 


x[l]x[2] ■ ■ 


■ x[l]x[L] 

■ x[2]x[L] 




. x[L]w[l] 


x[L\w[2] ■ 


■ x[L\w[L\ 


x[L\x[l] 


x[L]x[2] ■ ■ 


x[L]2 



(18) 

Then any quadratic equation can be written as a linear combination of the entries in this matrix, and 
any system of equations can be written as a linear operator acting on this matrix. For the particular 
problem of blind deconvolution, we are observing sums along the skew-diagonals of the matrix in 
the upper right-hand (or lower left-hand) quadrant. Incorporating the subspace constraints allows 
us to work with the smaller K x N matrix ^m*, but this could also be interpreted as adding 



additional linear constraints on the matrix in (18). 



Recent work on phase retrieval |4| has used this same methodology of "lifting" a quadratic problem 
into a linear problem with a rank constraint to show that a vector w G can be recovered 
from O(A^logA^) measurements of the form |(it;,a^)p for selected uniformly at random from 
the unit sphere. In this case, the measurements are being made entirely in the upper left-hand 
(or lower-right hand) quadrant in (18), and the measurements in ^ have the form An = CLn^in- 
In fact, another way to interpret the results in |4 is that if a signal of length L is known to live 
in a generic subspace of dimension ^ L/logL, then it can be recovered from an observation of a 
convolution with itself. 



In the current work, we are considering a non-symmetric rank-1 matrix being measured by matrices 
6^c| formed by the outer product of two different vectors, one of which is random, and one of which 
is fixed. Another way to cast the problem, which perhaps brings these differences into sharper 



relief, is that we are measuring the symmetric matrix in (18) by taking inner products against 



rank-two matrices ^ 







+ 



Ol ) . These seemingly subtle differences lead to a 



much different mathematical treatment. 



1.5 Application: Multipath channel protection using random codes 



The results in Section 1.3 have a direct application in the context of channel coding for transmitting 



a message over an unknown multipath channel. The problem is illustrated in Figure [T] A message 
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vector m e is encoded through an L x encoding matrix C. The protected message x — Cm 
travels through a channel whose impulse response is w. The receiver observes y — w ^x, and from 
this would like to jointly estimate the channel and determine the message that was sent. 

In this case, a reasonable model for the channel response w is that it is nonzero in relatively 
small number of known locations. Each of these entries corresponds to a different path over which 
the encoded message traveled; we are assuming that we know the timing delays for each of these 
paths, but not the fading coefficients. The matrix B in this case is a subset of columns from the 



identity, and the are partial Fourier vectors. This means that the coherence /xi in (11) takes its 
minimal value of /x^ = 1, and the coherence /i^ in (14) has a direct interpretation as the peak- value 



of the (normalized) frequency response of the unknown channel. The resulting linear operator A 
corresponds to a matrix comprised of N L x K random Toeplitz matrices, as shown in Figure [2j 
The first column of each of these matrices corresponds to a columns of C. The formulation of this 
problem as a low-rank matrix recovery program was proposed in |5|, which presented some first 
numerical experiments. 

In this context. Theorem [T] tell us that a length N message can be protected against a channel 
with K reflections that is relatively flat in the frequency domain with a random code of length 
L > {K + N)\og^(KN). Essentially, we have a theoretical guarantee that we can estimate the 
channel without knowledge of the message from a single transmitted codeword. 

It is instructive to draw a comparison in to previous work which connected error correction to 
structured solutions to underdetermined systems of equations. In foifT], it was shown that a message 
of length N could be protected against corruption in K unknown locations with a code of length 
L > N ^ K\og{N/K) using a random codebook. This result was established by showing how the 
decoding problem can be recast as a sparse estimation problem to which results from the field of 
compressed sensing can be applied. 

For multipath protection, we have a very different type of corruption: rather than individual 
entries of the transmitted vector being tampered with, instead we observe overlapping copies of the 
transmission. We show that with the same type of codebook (i.e. entries chosen independently at 
random) can protect against K reflections during transmission, where the timing of these bounces 
is known (or can be reasonably estimated) but the fading coefficients (amplitude and phase change 
associated with each reflection) are not. 



1.6 Other related work 

As it is a ubiquitous problem, many different approaches for blind deconvolution have been pro- 
posed in the past, each using different statistical or deterministic models tailored to particular 
applications. A general overview for blind deconvolution techniques in imaging (including methods 
based on parametric modeling of the inputs and incorporating spatial constraints) can be found 
in |8|. An example of a more modern method can be found in |9|, where it is demonstrated how an 
image, which is expected to have small total-variation with respect to its energy, can be effectively 
deconvolved from an unknown kernel with known compact support. In wireless communications, 
knowledge of the modulation scheme |10| or an estimate of the statistics of the source sig nal i n] 
have been used for blind channel identification; these methods are overviewed in the review pa- 



pers 12-15j. An effective scheme based on a deterministic model was put forth in 16 , where 



fundamental conditions for being able to identify multichannel responses from cross-correlations 
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m 




discovers unknown 
channel and message 



convolution with 
unknown channel 



Figure 1: Overview of the channel protection problem. A message m is encoded by applying a tall matrix 
C; the receiver observes the encoded message convolved with an unknown channel response w = Bh, where 
B is a subset of columns from the identity matrix. The decoder is faced with the task of separating the 
message and channel response from this convolution, which is a nonlinear combination of h and m. 




Figure 2: The multi-toeplitz matrix corresponding to the multipath channel protection problem in Sec- 
tion 1.5 In this case, the columns of B are sampled from the identity, the entries of C are chosen to be iid 
Gaussian random variables, and the corresponding linear operator A is formed by concatenating N L x K 
random Toeplitz matrices, each of which is generated by a column of C . 
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are presented. The work in this paper differs from this previous work in that it rehes only on a 
single observation of two convolved signals, the model for these signals is that they lie in known 
(but arbitrary) subspaces rather than have a prescribed length, and we give a concrete relationship 
between the dimensions of these subspaces and the length of the observation sufficient for perfect 
recovery. 

Recasting the quadratic problem in ([T]) as the linear problem with a rank constraint in ^ is 
appealing since it puts the problem in a form for which we have recently acquired a tremendous 
amount of understanding. Recovering di N x K rank-i? matrix from a set of linear observations 
has primarily been considered in two scenarios. In the case where the observations come through a 
random projection, where either the are filled with independent Gaussian random variables or 
A is an orthoprojection onto a randomly chosen subspace, the nuclear norm minimization program 
in ([9]) is successful with high probability when |3yi7| 

L > Const • i?max(K, TV). 

When the observations are randomly chosen entries in the matrix, then subject to incoherence 
conditions on the singular vectors of the matrix being measured, the number of samples sufficient 
for recovery, again with high probability, is |18[]2l] 



L > Const • i?max(K, N) log^(max(K, N)). 
Our main result in Theorem [l] uses a completely different kind measurement system which exhibits 



a type of structured randomness; for example, when B has the form (13), A has the concatenated 
Toeplitz structure shown in Figure m In this paper, we will only be concerned with how well this 
type of operator can recover rank-1 matrices, ongoing work has shown that it also effectively recover 
general low-rank matrices | 22j . 

While this paper is only concerned with recovery by nuclear norm minimization, other types of 
recovery techniques have proven effective both in theory and in practice; see for example f23-25 



It is possible that the guarantees given in this paper could be extended to these other algorithms. 

As we will see below, our mathematical analysis has mostly to do how matrices of the form in ^ act 
on rank-2 matrices in a certain subspace. Matrices of this type have been considered in the context 
of sparse recovery in the compressed sensing literature for applications including multiple-input 
multiple-output channel estimation |26 , multi-user detection |27|, and multiplexing of spectrally 
sparse signals |28j. 



2 Numerical Simulations 

In this section, we illustrate the effectiveness of the reconstruction algorithm for the blind decon- 
volution of vectors x and w with numerical experiment^ In particular, we study phase diagrams, 
which demonstrate the empirical probability of success over a range of dimensions N and K for a 
fixed L; an image deblurring experiment, where the task is to recover an image blurred by an un- 
known blur kernel; a channel protection experiment, where we show the robustness of our algorithm 
in the presence of additive noise. 



MATLAB code that reproduces all of the experiments in this section is available at http://www.alialimed.org/ 



code . html 
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Some of the numerical experiments presented below are "large scale", with thousands (and even 
10s of thousands) of unknown variables. Recent advances in SDP solvers, which we discuss in the 
following subsection, make the solution of such problems computationally feasible. 



2.1 Large-scale solvers 



To solve the semidefinite program ([8]) on instances where K and M are of practical size, we rely on 
the heuristic solver developed by Burer and Monteiro |29 . To implement this solver, we perform 
the variable substitution 



H 




H 


* 


'Wi 


X 


M 




M 




_X* 


W2_ 



where H is K x r and M is N x r for r > 1. Under this substitution, the semidefinite constraint 
is always satisfied and we are left with the nonlinear program: 



mm 



\M 



If + 



\H 



|2 

If 



subject to y = A^HM"" 



1, 



(19) 



When r = 1, this reformulated problem is equivalent to ([g]). Burer and Monteiro showed that 
provided r is bigger than the rank of the optimal solution of ([s]), all of the local minima of (19) 
were global minima of ([s]) |30|. Since we expect a rank one solution, we can work with r = 2, 
declaring recovery when a rank deficient M or H is obtained. Thus, by doubling the size of the 
decision variable, we can avoid the non-global local solutions of ([g]). Burer and Monteiro's algorithm 
has had notable success in matrix completion problems, enabling some of the fastest solvers for 
nuclear-norm-based matrix completion [31^^32] . 



To solve (19), we implement the method of multipliers strategy initially suggested by Burer and 
Monteiro. Indeed, this algorithm is explained in detail by Recht et al in the context of solving 
problem ^ |3|. The inner operation of minimizing the augmented Lagrangian term is performed 
using LBFGS as implemented by the Matlab solver minfunc |33|. This solver requires only being 
able to apply A and ^* quickly, both of which can be done in time 0{rmiii{N\ogN^K\ogK}). 
The parameters of the augmented Lagrangian are updated according to the schedule proposed by 
Burer and Monteiro |29]. This code allows us to solver problems where N and K are in the tens of 
thousands in seconds on a laptop. 



2.2 Phase transitions 

Our first set of numerical experiments delineates the boundary, in terms of values for N and L, 
for when ^ is effective on generic instances of four different types of problems. For a fixed value 
of L, we vary the subspace dimensions N and K and run 100 experiments, with different random 
instances of w and x for each experiment. Figures |3] and [4] show the collected frequencies of success 
for four different probabilistic models. We classify a recovery a success if its relative error is less 
than 2/^ meaning that if X is the solution to ([9]), then 

3.02. (20) 



WX-wx"" 


F 




\wx'' 


F 



^ The diagrams in Figures [s] and [Z] do not change significantly if a smaller threshold, say on the order of 10 ^, is 



chosen. 
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Figure 3: Empirical success rate for the deconvolution of two vectors x and w. In these experiments, x is 
a random vector in the subspace spanned by the columns of an L x N matrix whose entries are independent 
and identically distributed Gaussian random variables. In part (a), w is a generic sparse vector, with support 
and nonzero entries chosen randomly In part (b) w is a generic short vector whose first K terms are nonzero 
and chosen randomly. 



Our first set of experiments mimics the channel protection problem from Section and Figure [T] 



Figure [3] shows the empirical rate of success when C is taken as a dense L x N Gaussian random 



matrix. We fix L = 2048 and vary N and K from 25 to 1000. In Figure 3(a) , we take w to be sparse 
with known support; we form B by randomly selecting K columns from the L x L identity matrix. 
For Figure |3(b)| we take w to be "short" , forming B from the first K columns of the identity. In 
both cases, the basis expansion coefficient were drawn to be iid Gaussian random vectors. In both 
cases, we are able to deconvolve this signals with a high rate of success when L > 2.7 {K + N). 

Figure [4] shows the results of a similar experiment, only here both w and x are randomly generated 
sparse vectors. We take L to be much larger than the previous experiment, L = 32, 768, and vary 
N and K from 1000 to 16,000. In Figure [4(a)[ we generate both B and C by randomly selecting 
columns of the identity — despite the difference in the model for x (sparse instead of randomly 



oriented) the resulting performance curve in this case is very similar to that in Figure |3(a) , In 



Figure [4(b) , we use the same model for C and but use a "short" w (first K terms are non-zero 



Again, despite the difference in the model for cc, the recovery curve looks almost identical to that 
in Figure |3(b)[ 



2.3 Recovery in the presence of noise 



Figure [5] demonstrates the robustness of the deconvolution algorithm in the presence of noise. We 
use the same basic experimental setup as in Figure |3(a)j with L = 2048, N = 500 and K = 250, 
but instead of making a clean observation of w ^ x, we add a noise vector z whose entires are iid 
Gaussian with zero mean and variance a^. We solve the program (17) with 5 = (L + \/4L)-^/^(j, a 



value chosen since it will be an upper bound for ||z||2 with high probability. 

Figure |5(a)| shows how the relative error of the recovery changes with the noise level a. On a 
log-log scale, the recovery error (show as lOlog^^g (relative error squared)) is linear in the signal-to- 
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Figure 4: Empirical success rate for the deconvolution of two vectors x and w. In these experiments, x is a 
random sparse vector whose support and N non-zero values on that support are chosen at random. In part 
(a), w is a generic sparse vector, with support and K nonzero entries chosen randomly. In part (b) w is a 
generic short vector whose first K terms are nonzero and chosen randomly. 



noise ratio (defined as SNR= 101og]^Q(||K;x* 



\h 



For each SNR level, we calculate the average 



relative error squared over 100 iterations, each time using independent set of signals, coding matrix, 
and noise. Figure |5(b)| shows how the recovery error is affected by the "oversampling ratio" ; as L 
is made larger relative to N + the recovery error decreases. As before, each point is averaged 
over 100 independent iterations. 



2.4 Image deblurring 



Figure[6}[7| and [8] illustrate an application of our blind deconvolution technique to two image deblur- 
ring problems. In the first problem, we assume that we have oracle knowledge of a low-dimensional 
subspace in which the image to be recovered lies. We observe a convolution of the 65, 536 pixel 
Shapes image shown in Figure 6(a)| with the motion blurring kernel shown in Figure |6(b)[ the 



observation is shown in Figure |6(c)[ The Shapes image can be very closely approximated using 
only N = 5000 terms in a Haar wavelet expansion, which capture 99.9% of the energy in the image. 
We start by assuming (perhaps unrealistically) that we know the indices for these most significant 
wavelet coefficients; the corresponding wavelet basis functions are taken as columns of B. We will 
also assume that we know the support of the blurring kernel, which consists of = 65 connected 
pixels; the corresponding columns of the identity constitute C. The image and blur kernel recovered 
by solving Q are shown in Figure [7| 

Figure [8] shows a more realistic example where the support of the image in the wavelet domain is 
unknown. We take the blurred image shown in Figure [6(c) and, as before, we assume we know the 
support of the blurring kernel shown in Figure [6(b)} with K = 65 non-zero elements, but here we 
use the blurred image to estimate the support in the wavelet domain — we take the Haar wavelet 
transform of the image in Figure |6(c)[ and select the indices of the N = 9000 largest wavelet 
coefficients as a proxy for the support of the significant coefficients of the original image. The 
wavelet coefficients of the original image at this estimated support capture 98.5% of the energy in 
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Figure 5: Performance of the blind deconvolution program in the presence of noise. In all of the experiments, 
L = 2048, N = 500, K = 250, B is a random selection of columns from the identity, and C is an iid Gaussian 
matrix, (a) Relative error vs. SNR on a log-log scale, (b) Oversampling rate vs. relative error for a fixed 
SNR of20dB 




(b) 



(c) 



Figure 6: Shapes image for deblurring experiment, (a) Original 256 x 256 Shapes image x. (b) Blurring 
kernel w with a support size of 65 pixels, the locations of which are assumed to be known, (c) Convolution 
of (a) and (b). 



the blurred image. The recovery using ([9]) run with these hnear models is shown in Figure ^ 
and Figure |8(b)[ Despite not knowing the linear model explicitly, we are able to estimate it well 
enough from the observed data to get a reasonable reconstruction. 



3 Proof of main theorems 

In this section, we will prove Theorems T]and[2]by establishing a set of standard sufficient conditions 
for Xq to be the unique minimizer of (9]). At a high level, the argument follows previous literature 
[2l][34[ on low-rank matrix recovery by constructing a valid dual certificate for the rank-1 matrix 
Xo = ^m*. The main mathematical innovation in proving these results comes in Lemmas [TJ |2j [3] 
and |4j which control the behavior of the random operator A. 

We will work through the main argument in this section, leaving the technical details (including 
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(a) (b) 

Figure 7: An oracle assisted image deblurring experiment; we assume that we know the support of the 5000 
most significant wavelet coefficients of the original image. These wavelet coefficients capture 99.9% of the 
energy in the original image. We obtain from the solution of ([9|: (a) Deconvolved image x obtained from 
the solution of with relative error of \\x — x\\2/\\x\\2 = 1.6 x 10~^. (b) Estimated blur kernel w with 
relative error of \\w — tt;||2/||tt^||2 = 5.4 x lO""*-. 



(a) (b) 

Figure 8: Image recovery without oracle information. Take the support of the 9000 most-significant coef- 
ficients of Haar wavelet transform of the blurred image as our estimate of the subspace in which original 
image lives, (a) Deconvolved image obtained from the solution of with relative error of 4.9 x 10~^. (b) 
Estimated blur kernel; relative error = 5.6 x 10~^. 

the proofs of the main lemmas) until Sections [s] and |6j 

Key to our argument is the subspace (of R^^^) T associated with Xq = hm*: 

T = {X : X = ahv"" + I3um\ v eR^, M^, a, /3 G M} 

with the (matrix) projection operators 

Pt{X) = PhX + XPm - PhXPm 
Pt± (X) = {I- Ph)X{I - Pm), 

where Ph and Pm are the (vector) projection matrices Ph = hh* and Pm = mm*. 
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3.1 Theorem [H Sufficient condition for a nuclear norm minimizer 



The following proposition is a specialization of the more general sufficient conditions for verifying 
the solutions to the nuclear norm minimization problem Q that have appeared multiple times in 
the literature in one form or another (see |18|, for example). 

Proposition 1. The matrix Xq = hrri' is the unique minimizer to Q if there exists a Y ^ 
Range(^*) such that 

{hm* -Vt{Y),Vt{Z))f - {Vt±{Y),Vt±{Z))f + \\Vt±{Z)\U > 
for all Z eNulliA). 

Since 

{hm* -Vt{Y),Vt{Z))f-{Vt^{Y),Vt^{Z))f +\\Vt±{Z)\U 

> -\\hm* - VTiY)\\F\\PT{Z)\\F - \\Vt-{Y)\\ \\Pt^{Z)\U + \\VrAZ)\\ 

it is enough to find a 1" G Range (^*) such that 

-\\hm*-PT{Y)\\F\\VT{Z)\\F + {l-\\VTAY)\\)\\PTAZ)\U > 0, (21) 
for all Z G Null(^). 



In Lemma 1 in Section 



3.4 



^below we show that ||^|| < ^/b/j.^NK/L =: 7 < ^/N with appropriately 
high probability. Corollary |2] below also shows that with L obeying ([l6| , 

\\A{VTiZ))\\F>2-^/^\\VT{Z)\\F for all Z e Nun(^), 
with the appropriate probability. Then, since 

0^\\A{Z)\\f 
>\\AiVTiZ))\\F-\\A{Vr4ZmF 

>1=\\'Pt{Z)\\f-^\\Vt^{Z)\\f, 

we will have that 

\\Vt{Z)\\f < V2^\\VT±iZ)\\F < V2^\\rT±{Z)\U. (22) 



Applying this fact to ( |2T| ), we see that it is sufficient to find a 1" G Range(^*) such that 

(l-V2j\\hm*-PT{Y)\\F-\\VTAY)\\)\\VTAZ)\U > 0. 

Since Lemma [2] also implies that Vr]-±{Z) ^ for Z G Null(^), our approach will be to construct 
a Y G Range (^*) such that 

\\hm^-VT{Y)\\F<^ and \\Vt^{Y)\\ (23) 



In the next section, we will show how such a Y can be found using Gross's golfing scheme 120,34 
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3.2 Construction of the dual certificate via golfing 



The golfing scheme works by dividing the L linear observations of Xq into P disjoint subsets of 
size Q, and then using these subsets of observations to iteratively construct the dual certificate 
Y . We index these subsets by Fi, . . . , Fp; by construction IF^I = Q, Up = {1, . . . , L}, and 
Fp n Fp/ = 0. We define Ap be the operator that returns the measurements indexed by the set F^: 

Ap{W) = {trace(c,5^iy)},er„ AlA^W = bkhlWc^cl 

keTp 



The ApAp are random linear operators; the expectation of their action on a fixed matrix W is 



For reasons that will become clear as we proceed through the argument below, we would like this 
expectation to be as close to a scalar multiple of W as possible for all p. In other words, we would 
like to partition the L rows of the matrix B into P different Q x K submatrices, each of which is 
well-conditioned (i.e. the columns are almost orthogonal to one another). 

Results from the literature on compressive sensing have shown that such a partition exists for Lx K 
matrices with orthonormal columns whose rows all have about the same energy. In particular, the 
proof of Theorem 1.2 in 



35 



shows that if B is a L x K matrix with B^B = /, F is a randomly 
selected subset of {1, . . . ,L} of size Q, and the rows of B have coherence /x^ as in (11), then 
there exists a constant C such that for any < e < 1 and < 5 < 1, 



Q > C^max{logK,log(l/5)}, 



implies 



Q 



ker 



< 



eQ 



with probability exceeding 1 — 6. If our partition Fi, F2, . . . , Fp is random, then, applying the above 
result with 6 = {KN)~^ and e = 1/4 tells us that if 



Q > C^lK\og{KN), 



(24) 



then 



max 
i<p<P 



Q 



< 



Q_ 
4L 



max 
i<p<p 



< 



5Q 
4L' 



(25) 



with positive probability. This means that with Q chosen to obey (24), at least one such partition 



must exist and we move forward assuming that (25) holds 



Along with the expectation of each of the ApAp being close to a multiple of the identity, we will 
also need tail bounds stating that ApAp is close to its expectation with high probability. These 
probabilities can be made smaller by making the subset size Q larger. As detailed below, we will 
need 

Q > CaMlog{KN)logM where M = ms.^ {fj^jK , jj^In) , (26) 
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to make these probability bounds meaningful. Note that this means (24) will hold. 



The construction of Y that obeys the conditions (23) relies on three technical lemmas which are 
stated below in Section |3.4[ Their proofs rely heavily on re- writing different quantities of interest 
(linear operators, vectors, and scalars) as a sum of independent subexponential random variables 
and then using a specialized version of the "Matrix Bernstein Inequality" to estimate their sizes. 
Section [4] below contains a brief overview of these types of probabilistic bounds. The proofs of the 
key lemmas ([2j|3j and [4]) are in Section [sj These proofs rely on several miscellaneous lemmas which 
compute simple expectations and tail bounds for various random variables; these are presented 
separately in Section |6| 

With the Tp chosen and the key lemmas established, we construct Y as follows. Let Iq = 0, and 
then iteratively define 

Yp = Yp_i + ^ A;Ap {hm* - Vt{Yp-i)) . 
We will show that under appropriate conditions on L, taking Y := Yp will satisfy both parts of 



(23) with high probability. 

Let Wp be the residual between Yp projected onto T and the target ^m*: 

Wp^VriYp) -hm\ 

Notice that Wp^T and 



Wo = -^m*, ^P-^ {j^T - VtA;ApVt^ Wp^i. (27) 



Applying Lemma [2] iteratively to the Wp tells us that 



II^pIIf < ^II^p-iIIf < 2-^||^m*||i. = 2-^ p=l,...,P, (28) 
with probability exceeding 1 — 3P{KN)~^. Thus we will have 

\\hm*-VT{Yp)\\F<j^, 

for 

log(4\/27) 

P > — ; , which can be achieved with L > CQIoq:(KN), 

- log 2 ' - s\ J, 



which will holds with our assumptions on L in the theorem statement (16) and our choice of Q in 



(26). 



To bound 117^2^^(1^)11, we use the expansion 

Y —Y 1 - —A^'A W ^ — Y o — — ^* .A ^W o - —A^'A W 1 — 

■*- P ■*- Q^p^P^^ P—I -'■p—2 Q^p—l^p—l^^p—2 Q^p^P'^^p—l 

p=l ^ 
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and so 



\\Vr.{Yp)\\^ 



.p=l 



< 



< 



L 
Q 

L 

Q 
p 

E 

p=i 



L 



p=i 



Q 



Lemma [4] shows that with probabihty exceeding 1 — P{KN)~ 



(since Wp_i G T) 



< for all p = l,...,P. 



and so 



p=l 



Cohecting the results above, we see that both conditions in (23) wih hold with probabihty exceeding 
1 — 0{K{NK)~^) when Q is chosen as in (26) and L is chosen as in (16). 



3.3 Theorem 2: Stability 

With the conditions on L, we can establish three important intermediate results, each of which 
occurs with probability 1 — 0{L{NK)~^). The first is the existence of a dual certificate 1", as 



constructed in the previous section, that obeys the conditions (23). The second, given to us by 



Lemma [TJ is that the operator AA" is well conditioned, meaning that its eigenvalues obey 

< Xmin(AA') < A^ax(^X) < • (29) 

The third, given to us by Lemma[2]is that in addition, A^'A is well conditioned on T: \\VtA''AVt — 
Vt\\ < 1/2. 



With these facts in place, the stability proof follows the template set in f 34|[36] . We start with two 
observations; first, the feasibility of Xq implies 

ll^ll* < ll^oll*, (30) 

and 

\\A{X - Xo)\\2 < \\y - A{Xo)\\2 + \\A{X) - yh < 25. (31) 
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Set X = Xo + 1. The result ([29]) tells us that the orthogonal projection Va = A'{AA')~^A onto 
the range of ^* is well defined. We then break apart the recovery error as 

m\l-\\VAml + \\VA-ml m 
= \\VAm\'F + \\rTVA-m\%+\\T'T^T'A-im%- m 

A direct result of of Proposition[l]is that there exists a constant C > such that for all Z G Null(v4), 
||Xo + Z||* - ||Xo||* > 0117^^^(^)11* (this is developed cleanly in Since Vj^±{^) G Null(^), 

we have 



Combining this with (30) and the triangle inequality yields 

\\xo\u > \\xo\u + c\\VT^v^4m* - WT^Aim*, 

which implies 

\\rT.v_^.m*<c\\PAm* 



<Cy^mm{K,N)\\VAm\F- 



In addition, since we established pfj), with 7 = ^/4.5 jifKN/L given by Lemma |l| for all Z G 
Null(^), we have that 

and as a result 

iWA^ml < {2^' + i)\\Vr.PA-ml- 

Revisiting (32), we have 

\\X - Xofp < (272 + 1)\\Pt±P^4^)\\% + \\PAm% 

< C{2j'' + l)inm{K,N)\\PAml+\\'PAiml, 
and then absorbing all the constants into C, 



1^ - XqWf < C^Vmm{K,N)\\Pj^{mF 



< C^mm{K,Nh\\A*{AA*)-'\\ MlOlb- 



Using (29), and (31), we obtain the final result 



\X - XoWf < C—^mm{K, N)S. (34) 

ML 



3.4 Key lemmas 



Lemma 1 (AA"" is well conditioned.). Let A be as defined in (Q^ with coherences ii\ and ii\ as 
defined in (11) and (12). Suppose that A is sufficiently underdetermined in that 

L < ^'^^ (35) 

for some constant /3 > 1. Then with probability exceeding 1 — 0{L{NK)~^), the eigenvalues of 
AA* obey 



20 



The proof of Lemma [T] in Section [5] decomposes AA"" as a sum of independent random matrices, 
and then apphes a Chernoff-hke bound discussed in Section [Ij 



Lemma 2 (Conditioning on T). With the coherences ii\ and jij^ defined in Section 1.3, let 



(36) 



Fix a > 1. Choose the subsets Fi, . . . , Fp described in Section \37^ so that they have size 

\rp\ = Q = C'^-M\og{NK)\og{M), (37) 



where C'^ — 0{a) is a constant chosen below, and such that ( |25| ) holds. Then the linear operators 
^1, . . . , Ap defined in Section will obey 



max 

i<p<P 



VtA;ApVt - j^Vt 



< 



2L' 



with probability exceeding 1 — 3P{KN) ^ . 

Corollary 2. Let A be the operator defined in and M be defined as in (36). Then there exists 
a constant C'^ — 0{a) such that 



implies 



L > C'^-M\og{KN)\og{M), 



\\VtA'AVt-Vt\\ < 2' 



(38) 



Lit^ — L max IIVIC6/II9. 
Then there exists a constant Ca = 0{a) such that if 

L > Co,M\og{KN)^\ogM 

then 

lip < for p= 

with probability exceeding 1 — 2L{KN)~^ . 



with probability exceeding 1 — 3(KN) ^ . 

Lemma 3. Let M , Q, the V^, and the Ap be the same as in Lemma^ Let Wp be as in (27); and 
define 

(39) 

(40) 
(41) 



Lemma 4. Let a, M, Q, the Tp, and the Ap be the same as in Lemma^ and jip and Wp be the 
same as in Lemma^ Assume that (28) and (41) hold: 

\\Wp-i\\F < 2"^+^ and fip-i < 2-^+V/z- 
Then with probability exceeding 1 — P(KN)~^ , 



max 

i<p<P 



ApApWp—i ^ Wp—i 



< 2-P 



3Q 
4L' 
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4 Concentration inequalities 



Proving the key lemmas stated in Section 3.4 revolves around estimating the sizes of sums of 
different subexponential random variables. These random variables are either the absolute value 
of a sum of independent random scalars, the euclidean norm of a sum of independent random 
vectors (or equivalently, the Frobenius norm of a sum of random matrices), or the operator norm 
(maximum singular value) of a sum of random linear operators. In this section, we very briefly 
overview the tools from probability theory that we will use to make these estimates. The essential 
tool is the recently developed matrix Bernstein inequality |37 . 

We start by recalling the classical scalar Bernstein inequality. A nice proof of the result in this 
form can be found in |38l Chapter 2]. 



Proposition 2 (Scalar Bernstein, subexponential version). Let zi,. . . ,zk be independent random 
variables with Fj[zk] = and 

P{\zk\ >u} < Ce-^/^^ (42) 
for some constants C and a/e. A: = 1, . . . , K with 

K 

cr^= >^a? and B— max Gh- 

^ " l<k<K 

k=i 



Then 



and so 



P{\zi H \- zk\> u} < 2exp 



2Ccr2 + 2Bu 



\zi + --- + zk\ < 2max|\/Ccrv/t + log2, 2S(t + log2)| 
with probability exceeding 1 — e~^. 



To make the statement (and usage) of the concentration inequalities more compact in the vector and 
matrix case, we will characterize subexponential vectors and matrices using their Orlicz-1 norm. 

Definition 1. Let Z be a random matrix. We will use || • ||^i to denote the Orlicz-1 norm: 

||Z||^, = inf{E[exp(||Z||/n)]<2}, 

U>(J 

where \\Z\\ is the spectral norm of Z. In the case where Z is a vector, we take \\Z\\ = ||^||2- 



As the next basic result shows, the Orlicz-1 norm of a random variable can be systematically related 
to rate at which its distribution function approaches 1 (i.e. ak in ([42|)). 

Lemma 5 (Lemma 2.2.1 in [38]). Let z be a random variable which obeys P {\z\ > u} < ae~^^. 
Then < (l + a)//3. 



Using these definitions, we have the following powerful tool for bounding the size of a sum of 
independent random vectors or matrices, each one of which is subexponential. This result is mostly 
due to [37 1, but appears in the form below in [24) . 
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a — max 



Proposition 3 (Matrix Bernstein, Orlicz norm version). Let Zi, . . . , Zq he independent K x N 
random matrices with E[Zg] = 0. Let B be an upper bound on the Orlicz- 1 norms: 

max IIZ^II^, < B, 

and define 

( Q Q ] 

J2nZ,Z;] , 5^E[Z*Z,] . (43) 

q=l q=l J 

Then there exists a constant C such that for all t > 

||Zi + ... + Zg|| < Cmaxj a + log(K + iV), Slog(^^^^ (t + log(K + TV)) |, (44) 
with probability at least 1 — e~^. 

Essential to establishing our stability result, Theorem [2} is bounding both the upper and lower 
eigenvalues of the operator AA"". We do this in Lemma [T] with a relatively straightforward appli- 
cation of the following Chernoff-like bound for sums of random positive symmetric matrices. 

Proposition 4 (Matrix ChernofF |37|). Let Xi,...,Xiv be independent L x L random self- 
adjoint matrices whose eigenvalues obey 



Define 



Then 



and 



< Xmm{Xn) < Amax(^n) < R almost surcly. 



Mmin Amin ( ^ E[X^] j and /imax Amax ( ^ E[X^] j . 

\n=l J \n=l J 

Pj^min (^J^^X^j <tMmin| < Le'^^-'^"^-^-/^'' for tG[0,l], 



P < Amax XI > ^Mmax f < ^ 



for t > e. 



(45) 



(46) 



5 Proof of key lemmas 
5.1 Proof of Lemma [T] 

The proof of Lemma [T] is essentially an application of the matrix Chernoff bound in Proposition [Ij 
Using the matrix form of A, 

A= [AiB ••• AnB] , 
we can write AA"" as sum of random matrices 

N 
n=l 
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where = diag({Q[n]}^) as in ([s]). To apply Proposition [4| we will need to condition on the 
maximum of the magnitudes of the ci[n\ not exceeding a certain size. To this end, given an a 
(which we choose later), we define the event 



-A 



Vr. — I max |c/[nl| < a \ . 

I l<n<N I Jl - ( 
Kl<i<L/2 



Then 



P {An,ax(^^*) >v}<V {A^ax(^X) > V \ V ^} P {F,} + P {F^} (47) 
< P {An,ax(^^*) > V I F,} + P {F^} , (48) 

and similarly for P {Amin(-4.4*) > v}. Conditioned on Fq;, the complex Gaussian random variables 
still zero mean and independent; we denote these conditional random variables as c^[n], and 
set = diag({c^[n]}^), noting that 

WM?] = E[|Q[n]|2 I r«] = =: al < 1. 

We now apply Proposition [4] with 

R = max {A„ax(A;SS*A';)} 

< max {A„ax(A;)A„ax(SB*)A„ax(A';)} 



and 



N 

n J 

n=l 



iVA^a, (E[A;BB*A';]) 

lo 

<^ilN- 



< Na^ max ||6^||2 



2 \\u Il2 

K 



and 



/ N 

/imin := A^in (5^e[a;BB*A;, 



= Nalmin \\bi\\l 

2 2 



to get 



< L exp 



/ alnlNK \ 
\ 8a^L J ' 
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where we have take t = 1/2 in (45), and 



< Lexp 



where we have taken t = e^/^ in (|46l). 



Taking a = ^log(NK) for some P>1 results in al > 0.97 (for N,K>4), and 

p{r;^} < L{NK)-P. 



Taking L as in ( 35 ) yields 



P <^ XmaAAA*) > 4.5m? 



NK 

~2L 

NK 



which establishes the lemma (since A*? > A^i) 



r^j<L{NK)-^ 



5.2 Proof of Lemma [2] 

The proof of the Lemma and its corohary follow the exact same line of argumentation. We will start 
with the conditioning of the partial operators Ap on T; after this, the argument for the conditioning 
of the full operator A will be clear. 



We start by fixing and set F = Fp. With 



where the bk G obey (10),(11),(14) and the Ck G C are random vectors distributed as in ([15| 



we are interested in how the random operator 

ker 

concentrates around its mean in the operator norm. This operator is a sum of independent ran- 
dom rank-1 operators on N x K matrices, and so we can use the matrix Bernstein inequality in 
Proposition [3] to estimate its deviation. 

Since A^ = bkC^^ Vt{Aj^) is the rank-2 matrix given by 

VriAk) = {bk,h)hcl + {m,Ck)bkm* - {bk,h){rn,Ck)hrri' 
hvl + Ukm*, 

where Vk = {h, bk)ck and Uk = (m, Ck){bk - (6/c, h)h) = (m, Ck){I - hh'^)bk- 

The linear operator Vt{-), since it maps K x N matrices to K x N matrix, can itself be represented 
as a KN x KN matrix that operates on a matrix that has been rasterized (in column order here) 
into a vector of length KN . We will find it convenient to denote these matrices in block form: 
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{M{i,j)}ij, where M(z, j) is a, K x K matrix that occupies rows {i — 1)K + 1, . . . , iK and columns 
{j — 1)K + 1, . . . ^jK. Using this notation, we can write Vt as the matrix 

Vt = {hh'S{iJ)}ij + {m[7]m[j]I}ij - {m[7]m[j]hh'}ij, (49) 

where S{i^j) = 1 if i = j and is zero otherwise. 

We wiU make repeated use the foUowing three facts about block matrices below: 

1. Let M be an operator that we can write in matrix form as 

M = {MS{iJ)}ij 
for some K x K matrix M. Then the action of on a matrix X is 

M{X) = MX, 
and so ||A^|| = ||M||. Also, M*{X) = M*X. 

2. Now suppose we can write Ai in matrix form as 

M = {p[i]*q[mi,j, 
for some p^q ^ C^. Then the action of on a matrix X is 

and so ||A^|| = ||gfp*|| = ||gf||2||p||2. Also, A^*(X) = Xpgf*. 

3. Now let 

M = {p[irq[j]M},,j. 
Then the action of on a matrix X is 

M{X) = MXqp*, 

and so = ||M|| \\qp*\\ = ||M|| ||q||2||p||2. Also M*{X) = M*Xpq*. 

We will break VriAk) ® VriAk) into four different tensor products of rank-1 matrices, and treat 
each one in turn: 

VriAk) (8) VriAk) = hv^ (g) hvl + hvl U].m* + Ukm* (g) hv^ + tt^m* (g) Ukm*. (50) 

To handle these terms in matrix form, note that if Uivl and U2V2 are rank-1 matrices, with Uj G C'^ 
and Vi G C^, then the operator given by their tensor product can be written as 



' Vl[l]*V2[l]uiU2 Vl[l]*V2[2]uiU2 
Vi[2]*V2[l]uiU*2 Vi[2]*V2[2]uiU^ 

Vl[N]*V2[l]uiU*2 



Vl[l]*V2[N]uiU*2' 

vi[2]*V2[N]uiu^ 
vi[N]*V2[N]uiul 



= {vi\i]*V2\j]uiu'^}ij . 
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For the expectation of the sum, we compute the fohowing: 

E[hvl(g)hvl] = \{h,bk)\^ E[{ck[i]*Ck[j]hh*}ij] 
= \{h,bk)\HS{i,j)hh*}i 



and 



E[ukm* (S) Ukm*] = E[|(m,Cfc)|2] {m[i]m[j]{I - hh*)bkbl{I - hh*)}ij 
= {m[i]m[j]il - hh*)bkbl{I - hh*)}ij, 



since E[|(m,c/e)p] = ll^lli — 1? 



and 



= {bk, h) {m[i]m[j\hbl{I - hh')}i^j, 



E[ukrri' hvl] = {h, bk) {E[c/e[j](m, Ck)]m[i]{I - hh'')bkh*}ij 
= {h, bk) {m\i]m[j]{I - hh^)bkh'}i^j. 



A straightforward calculation combines these four results with ( |49| ) to verify that 

E[PT{Ak) ® VriAk)] = VT{{bkblS{i,j)}ijVT). 



In light of (25), this means 



E 



(51) 



where \\g\\ < Q/AL. 



We now derive tail bounds for how far the sum over V for each of the terms in (50) deviates from 
their respective means. Starting with first term, we use the compact notation 

Zk = hvl ® ^^k - ^[^^k ® f^K]^ 

for each addend. To apply Proposition [sj we need to uniformly bound the size (Orlicz ipi norm) of 
each individual as well as the variance in (Esl). For the uniform size bound. 



\Zk\\ = \{h,bk)\^ \\{{ckWck[j]-5{i,j))hh*}iJ 

= \{h,bk)\^ \\{{c,[i]*c,[j] - 5{i,j))I}{hh*5{iJ)}iJ 

< \{h,bk)\^\\hh*\\ \\ck4-I\\ 

< ^max(||cfc||i,l). 



Applying Lemma [7j 



P{max(||cfc||2,l) >u} < 1.2e 



-u/SN 
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and combined with Lemma [5] this means 



\\Zk\U, < ^||max(||c,||i,l)||^, ^ 

For the variance, we need to compute ^[Z^Zk]. This will be easiest if we rewrite the action of Zk 
on a matrix X as 

Zk{X) = \ {h,bk)\''hh*X{ckcl-I), 

and so 

ZlZkiX) = \{hM)\''\\h\\lhh*X{ckcl-I)\ 

and 

F.[ZlZk{X)] = \{h,bk)\^\\h\\lhh*X^[{ckcl-lf] 
= N\{h,bk)\*hh*X, 

and finally 



J2^[Z*kZk] 



< 



< 



fcer 
4L2 ' 



where we have used (25) in the last step. Collecting these results and applying Proposition [s] with 
t = a\og{KN) yields 



Y,hv*k®hv*k-E[hvl®hvl] 



< 



Ca max { VQ, I,, y^NlogiKN) logUN) } , 

(52) 

with probability exceeding 1 — {KN)~^. 
For the sum over the second term in ([50]), set 

= (|(m, c,)|2 - 1) {m[z]m[j](/ - hh'')hbl{I - 

then using the fact that ||/ — hh''\\ < 1 (since ||^||2 = 1), we have 

\\Zk\\ = ||(m,Cfc)|2-l| \\{I-hh*)bkg\\mg 
< I |(m,Cfc)p - l| \\bk\\l 

<||(m,c,)p-l| 

This is again a subexponential random variable whose size we can characterize using Lemma [9] 



(m,Cfc)| -l||v;i<C' and so \\Zk\\^^^ < C 



L 
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To bound the variance in (44), we again write out the action of Z]. on an arbitrary K x N matrix 
Zk{X) = (|(m, Cfc)|2 - - hh*)bkbl{I - hh*)Xmm\ 



X: 

and so 



Y.[ZlZk{X)] = E[(|(m,Cfc)|2 - 1)2]||(J - hh*)bk\\Ul - hh*)bkbl{I - hh*)Xmm* 
= ||(J - hh*)bk\\l{I - hh*)bkbl{I - hh*)Xmm*, 

where in the last step we have used the fact that |(m, Cfc)p is a chi-square random variable with 
two degrees of freedom with variance E[(|(m, Cfc)p — 1)^] = 1. This gives us 



hh')bkg(I - hh')bkbUl - hh' 



<max{\\{I-hh')bk\\l) 



Y,{I-hh')bkbl{I-hh') 

ker 



< 



< 



L 
4L2 



ker 



Cohecting these results and applying Proposition |3] with t = alog{KN) yields 
UkTn* UkTn* — E[t6/em* ® Ukm*] 



ker 



< 



(53) 



with probability exceeding 1 — (KN) ^. 



The last two terms in (50) are adjoints of one another, so they will have the same operator norm. 
We now set 

= {h,bk){m[i]{ck[j]{rn,Ck) - m\j]){I - hh')bkh'}i^j, 
and so the action of Z]^ on an arbitrary matrix X is given by 

Zk{X) = {h, bk){I - hh*)bkh*X{ckcl - I)mm\ 

from which we can see 

\\Zk\\ < \{h,bk)\ WbkhWickcl - I)m\\2 



< 



\\{ckcl - I)m\\2. 



From Lemmas 10 and [sj we that the random variable || (c/^c^ — I)m\\2 is subexponential with 
\\{ckcl — I)m\\^^ < CVN, and so 



\^k\\ipi ^ C 



L 
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For the variance in (43), we need to bound the sizes of both Z^Zj^ and Z]^Z'^. Starting with the 
former, we have 

nZlZu{X)\ = |(^6fc)|2||(J-/tr)||i^/i*XE[(cfc4-I)mm*(cfc4-I)], 
and then applying Lemma [TT] yields 



Y,\{h,hk)f\\{i-hh*)h^f^ 



ker 
ker 

,2 



< 



< 



ker 



4L2 ■ 



For ZkZ*, 



E[ZkZ;{X)] = \{h,bk)Wl-hh*)bkbUl-hh*)Xmm*E[{ckcl-lf]mm*, 



and then applying Lemma [8] yields 



ker 



< N 



\ker J 
Y,\{Kbk)?bkhl 



< 



< 



ker 



L 



ker 



Collecting these results and applying Proposition |3] with t — a\og{KN) and M — max {f^iK^ f^h^} 
yields 



ker 



< 



a y^MMEFl ^ax { VQ, v/Mlog(iVi^)log(M)} , 



(54) 



with probability exceeding 1 — (KN) ^. 
Using the triangle inequality 



VtAIApVt - j-Vt 



< WVtAIApVt - EIVtAIApVt]]] + 



EIVtAIApVt] - y'Pt 



we can combine (51) with (52), (53), and iMl to establish that 



VtA;ApVt - j-Vt 



< 



^M^Mm. ^ax { v^, v^Mlog(iVi^) log(M) } 



+ 



Q_ 
4L' 
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with probability exceeding 1 — 3{KN) ^. With Q chosen as in (37), this becomes 

Q 



VtAIApPt - j-Vt 



< C(y — max ■ 



+ 



for chosen appropriately. Applying the union bound establishes the lemma. 



- 2L' 



To prove the corollary, we take F = {1,...,L} and Q — L above. In this case, we will have 
EfcGr ^kbl = /, and so e = in ([51]). We have 



WVtA'^AVt -Vt\\< Ca max 



M log{NK) M log{NK) log(M) 



with probability exceeding 1 — 3{KN) ^. Then taking L as in (38) will guarantee the desired 
conditioning. 

5.3 Proof of Lemma [3] 

We start by fixing i G Fp+i and estimating || W^6^||2. We can re- write Wp as a sum of independent 
random matrices: since Wp_i G T, Pr(Wp_i) — Wp-i and 

Wp = Vt (^A;ApWp-i - jWp-i^ 

\kerp keFp J \kerp J 

keTp \kerp J 



For the second term above 

\kerp J 2 \kerp 



w;be\\2 < 




+ 






keTp 


2 





L 



(55) 



J2 bkblWp.i - |Wp_i 

keVp 



E ^^^^ - f ^ 



Wp_i||F 



^ 2-P+Vl^A^Q 
4L3/2 
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where we have used (25) and the fact that the Frobenius norms of the Wp decrease geometricahy 
with p; see ([28]). 





< 


J2 


+ 




keVp 


2 


keTp 


2 


keTp 



The first term in (55) is the norm of a sum of independent zero-mean random vectors, which we 
will bound using Propositions [2] and |3j We set = Wp_ibk and expand 6|Pt(^/c) as 

b}VT{Zk) = {h,bi){bk,h)wl(ckcl - I) + {bk,bi)wl{ckcl - I)mm'^ - {h,bi){bk,h)wl{ckcl - I)mm\ 

and so 

(56) 

where the are independent random vectors, and the are independent random scalars: 

Zk = {be,h){h,bk){ckcl - I)wk, Zk = (I - hh*)be) {{ckcl - I)m,Wk). 
Using Lemma [T2j we have a tail bound for each term in the scalar sum: 

P {kfcl > A} < 2e • exp (-- .. ... — . . 

V \\wkh\{bkAl - hh*)bi>)\ 

Applying the scalar Bernstein inequality (Proposition [2]) with 



and 



B = va^yL\\wk\\2\{bk,{I -hh*)bi>)\ < — 



a^=Yl \\wk\\l\{bk,il-hh*)be)\ 

keTp 

<'3^Y,\{bk,{I-hh*)W)\'' 



L 



kev^ 



4L2 



and t = a log{KN) tells us that 



J2 

keVp 



< C' 



L3/2 



max{VQ,/xiV/Clog(KAr)}, (57) 



with probability at least 1 — (KN) 

For the vector term in (jSG)), we apply Lemmas 10 andlslto see that 



N/cllvi < CVN\\wk\\2\{bi,h){h,bk) 



<C'- 



L3/2 
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For the variance terms, we calculate 

J2 m^k] = E \{hM)f\{bk,h)\'wlE[{c,cl- If]wk 

^nJ2 \{h,be)\^\{bk,h)mwkg (by Lemma U) 



and 



Thus 



,2 ..2] 



L2 



fcer„ 



< 



keTp 



4L3 



E \{hM)\'\{bk,h)\^E[{ckcl-I)wkwUckcl-I)] 



/cer„ 



<^EK^^'^)I' 



(by Lemma 11 ) 



L2 



< 



4L3 



5^ ^fc 



< 



C« ^^-^^^^f^j"^^^^^ max { VQ, log(M.) V^ki(^} (58) 



with probability at least 1 — (KN) ^. 



Combining (57) and (58) and taking the union bound over all I E r^+i yields 

CaVMQ CaM log{K N),/ 13 log M 



L "^-^ L 

with probability exceeding 1 — 2Q{KN)~^. Then taking the union bound over 1 < p < P establishes 
the lemma. 



5.4 Proof of Lemma [4] 



We start by fixing p and writing 



— Il^p^p^'^^— 1 -^[^p^p^'^^— i] 1 1 ~^ 



We will derive a concentration inequality to bound the first term, and use ( |25| ) for the second. We 
can write the first term above as the spectral norm of a sum of random rank-1 matrices: 



k^Tp 



Zk hkblWp-i{ckcl - I). 



(59) 
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We will use Proposition |3] to estimate the size of this random sum; we proceed by calculating the 
key quantities involved. With Wk = Wp_ibk^ we can bound the size of each term in the sum as 

\\Zk\\ = \\hblWp{ckcl-I)\\ 

= Wbkh \\{CkCl - I)Wk\\2 



< fJ^iy \\(CkCl - I)Wk\\2 



and then applying Lemmas 10 and [5] yields 



KN „ „ ^ IKN 
\Zk\\^^ < C jiiid \\wk\\2 < C iiiiip\ 



For the variance terms, we calculate 



and 







^ \\hk\\l'E[{ckcl - I)wkwl{ckcl - I)] 


k^Tp 




keTp 



X] ll^>fc||2ll'^/e||i (by Lemmajn]) 



.2 



< 



< 



f4K 
L 



(using ([251)), 







Y,hwlE[{ck4-lf]wkbl 


keTp 




keTp 



N 



keVp 



(by Lemma Is]) 



< 



< 



L 



E ^''^*k 

keTp 



5filNQ 
4L2 " 



Then with M — max. [niK, ^^N^ , we apply Proposition |3] with t — a\og{KN) to get 

\\A;ApWp.i - E[A;ApWp.i]\\ < Ca 2-P VMlogiKN) j^g^ ^Mlog(i^7V) log(M)} , 



with probability exceeding 1 — (KN) ^. With Q as in (37), this becomes 



\\A;ApWp-i - E[A;ApWp-i]\\ < Ca 2-P I max 



< 2-P 



Q_ 
4L' 
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for an appropriate choice of C^. Thus 

Q 



< WAlApWp-i - E[A;ApWp-i]\\ + 



e[a;Aj,Wp-i] - jWp-i 



< 2-P 



Q_ 
4L 



ker„ 



Q 
L 



p-l||F 



<2-P^ + 2-P+i-^. 
- 4L 4L 

Applying the union bound over all p = 1, . . . , P establishes the lemma. 



6 Supporting Lemmas 



Lemma 6. Let Ck G C''^ he normally distributed as in (Il5|), and let u G C^'*' he an arhitrary vector. 



•<N 



Then \{ck,u)\ is a chi- square random variable with two degrees of freedom and 

P{Kcfc,«)|2>A} < e-VIHIi. 



Lemma 7. Let Ck G C be normally distributed as in (15). Then 



P{\\ck\\l > Nu} < 1.2( 



-u/8 



for all u > 0, 



(60) 



and since 1.2e-^/^^ > 1 for all N>1, 



P{max(||cfc||i,l) > A^n} < 1.26""/^. 



Proof. It is well-known (see, for example, [39^) that 



P{||cfc||i > A(l-I-A)} < 



e-^'/8 < A < 1 



^e-V8 A > 1 
Plugging in A = ?i — 1 above yields 

P{\\ck\\l>Nu} < 1.2e-^/^ u>l 
Since 1.2 e~^/^ > 1, the bound above can be extended for all u >0. 
Lemma 8. Let Ck G be normally distributed as in (15). Then 

E[ickcl-I)^] = NI. 



< 1.05 e-^/^ A > 0. 



(61) 



□ 



Proof. Using the expansion 



(cfc4 - I)^ = ||cfc||icfc4 - 2cfc4 + I, 



we see that the only non-trivial term is R — WckW^CkC^. We compute the expectation of an entry 
in this matrix as 



N 



E[R{i,j)\ ^ yE[\ck[n\\ Ck[t\ck[3\ \ ^ < ... 

t^j 



n=l 
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For the addends in the diagonal term 



E[|cfcN|2|cfc[i]|2] 



E[|cfc[n]|4] = 2 n = i 
1 n ^ i 



where the calculation iox n — i relies on the fact that E[|cfc[n] |^] is the second moment of a chi-square 
random variable with two degrees of freedom. Thus E[i?] = (A'' + 1)1, and 



E[(cfc4 - If] ^ {N + 1)1 - 21 + I ^ NI. 



□ 



Lemma 9. Let Ck G C''^ be normally distributed as in (15), and let v be an arbitrary vector. Then 
E[|(cfc,v)p] = ||i;||2 and 

F{\\{ck,v)\^ -\\v\\l\> X} < 2.1 exp . 



Proof. A shght variation of (61) gives us that 



x.ri,/ m2 n „2i M f2e-^'/8lHli 0<A<1 



□ 



The lemma follows from combining these two cases into one subexponential bound. 
Lemma 10. Let £ C''^ be normally distributed as in (15), and let v G C''^ be an arbitrary vector. 

P {\\{ckcl - I)v\\2 > X} < 3exp' ^ 



Then 



VSN\\v\\2, 
Proof. We have 

||(Cfc4 - I)v\\2 = \\{v,Ck)Ck - V\\2 < \{v,Ck)\\Ck\\2 + \\v\\2- 

For the first term above, we have for any r > 0, 

P{|(t^,Cfc)|||cfc||2 > AV]V||i;||2} <P{\{v,Ck)\ > V^||i;||2/r} + P {||cfc||2 > rv^} 

= P {|(i;,Cfc)|2 > X\\v\\1/t^} + P {llcfclli > T^XN} 

We can then use the fact that \{v^ is a chi-squared random variable along with ([60]) above to 
derive the following tail bound: 

F{\{v,Ck)\ Wckh > XVNWvh] < e-^/^' + l.OSe-^'^/^ 

= 2.05 e"^/^, 

where we have chosen — \/8. Thus 

P{\{v,Ck)\ ||cfc||2 + llt^lb > A} < 2.05 6^/^-6-^/^. 



□ 
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Then 



Lemma 11. Let £ C''^ be normally distributed as in (15), and let v G C'^ be an arbitrary vector. 

E[(cfc4 - I)vv*{ckcl - I)] = \\vgl. 



Proof. We have 

mCkCl - I)vv''{CkCl - I)] = E[|('U,C/e)pC/e4 - CkClvv'' - Vv"" CkCl - Vv""] 

^'E[\{v,Ck)\^Ckcl]-vv\ 

Let R{i,j) be the entries of the first matrix above: 

i?(z,j)=E[|(i;,c,)pc,[z]c,[j]*] 

= ^ i;[ni]i;[n2]E[c/e[ni]c/e[n2]*c/e[i]c/e[j]*]. 

711,712 

On the diagonal, where i — all of the terms in the sum above are zero except when ni = n2, and 
so 

AT 



i?(i,z) = 5^|i;[n]|2E[|cfc[n]|2|cfc[z 



71=1 



Using the fact that 



E[|cfc[n]p|cfc[z] 



2 n — i 
1 n ^ i 



we see that R{i,i) = + ll'^lli- Off the diagonal, where i ^ j, we see immediately that 

E[cj[;^[ni]cj[;^[n2]*Cj[;^[i]cj[;;[j]*] will be zero unless one of two (non-overlapping) conditions hold: (ni = 
h '^2 = j) OT (ni = j, 712 — 1). Thus 

R{i,j) = E[cfc[i]2] E[cfcbf] + v[j\v[^ n\ck\j]?] E[|cfc[i]p]. 

Note the lack of absolute values in the first term on the right above; in fact, since the Ck[i] have 
uniformly distributed phase, E[cA;[i]^] = E[c/c[j]^] = 0, and so R{i,j) = v[i]v[j]. As such 



E[(cfc4 - I)vv*{ck4 - I)] = E[\{v,Ck)\'^Ckcl] - vv* = vv* + - vv 



□ 



Lemma 12. Let Ck G C be normally distributed as in (15), and let u,v ^ C be arbitrary vectors. 
Then 



P {\{ck,v){u,Ck) - {u,v)\ > A} < 2e • exp 



A 



2 2 



Proof. For any t > 0, 



F {\{ck,v){u,Ck)\ > X} <P {\{ck,v)\ > t} + F {\{u,Ck)\ > X/t} 

= P{\{ck,v)\^ > t^}+P{\{u,Ck)\^ > AVt'} 



< exp — 
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Choosing = A||'?;||2/||'?/||2 yields 

F{\{ck,v){u,Ck) \ > A} < 2exp 

and so 

F{\{ck,v){u,Ck) - {u,v)\ > A} < F{\{ck,v){u,Ck)\ > A - Hixlblli^lb} 

< 2exp (- „ „'^, „ + 1 ) 

= 2e • exp — — 7—r — — 

□ 
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